Starcoder2 model - bis#29215
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
In #29228 I mention that static cache is not a blocker for the PR 😉
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks ! LGTM once we add the docs !
| Starcoder2 has been released with the paper [Stacoder-2](https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view) by BigCode team. | ||
|
|
||
| Documentation page about the model is coming soon |
There was a problem hiding this comment.
this is fairly short. Let's add that the main difference with mistral is dropout, as the authors would be nice to explain how much this influenced training for example.
There was a problem hiding this comment.
yeah makes sense, I will take care of that after the official release
|
Thank you for the PR @RaymondLi0. cc @loubnabnl and @lvwerra, next time let's make sure doc and paper links are fully completed before we merge, we require this for every model / organisation regardless of the release date 😉 let's make this an exception |
The Starcoder2 model, adapted from Mistral.
All changes are done through options, so Mistral itself is still supported.Main changes:
*Embedding and residual dropout
It does not support absolute embeddings, so can't support Santacoder or Starcoder
Starcoder2-3B model: https://huggingface.co/bigcode/starcoder2-3b
Todo:
Core generation] Adds support for static KV cache #27931, [CLeanup] Revert SDPA attention changes that got in the static kv cache PR #29027 (and future changes from Feb. 19) (in a future PR?)@younesbelkada @ArthurZucker @jlamypoirier